YJTI at the NTCIR-13 STC Japanese Subtask

نویسنده

  • Toru Shimizu
چکیده

In this paper, we describe our participation in the NTCIR-13 STC Japanese Subtask, in which we develop systems with the retrieval-based method. To retrieve reply texts for a given comment text, our system generates vector representations of both the comment and candidate replies by a 3-layer LSTM-RNN and evaluate distances between the comment vector and the candidate reply vectors, selecting the top-k nearest reply vectors and returning the corresponding reply texts. We also take Theme and Genre into consideration to decide the final ranking. In preparation of the candidate reply texts, we utilize all the comments and replies in the training set of Yahoo! News comments data. Our two runs are based on two different LSTM-RNN models, one trained over Twitter conversation data and the other mainly trained over Yahoo! Chiebukuro QA data. Each dataset has no less than 60 million pairs of text, and we aim to show how effective these combinations of large-scale datasets and largescale neural models are for developing dialog systems. In addition, we had an assumption that the model of Twitter conversation data would outperform that of Yahoo! Chiebukuro QA data as the task domain seemed to be more similar to conversations in microblog services than social question answering, but the reported results revealed that it was not the case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Response Generation for Grounding in Communication at NTCIR-13 STC Japanese Subtask

The AITOK team participated in NTCIR-13 STC Japanese Subtask. This report describes our approach to generating responses to comment texts of Yahoo! News comments data, and discusses our results of formal-run. Our approach intends to make sure of grounding in communication, thereby integrates three strategies and five rules. The strategies are on the presupposition that there is not enough infor...

متن کامل

Overview of the NTCIR-12 Short Text Conversation Task

We describe an overview of the NTCIR-12 Short Text Conversation (STC) task, which is a new pilot task of NTCIR-12. STC consists of two subtasks: a Chinese subtask using post-comment pairs crawled from Weibo, and a Japanese subtask providing the IDs of such pairs from Twitter. Thus, the main difference between the two subtasks lies in the sources and languages of the test collections. For the Ch...

متن کامل

YUILA at the NTCIR-12 Short Text Challenge: Combining Twitter Data with Dialogue System Logs

The YUILA team participated in the Japanese subtask of the NTCIR-12 Short Text Challenge task. This report describes our approach to solving the responsiveness problem in STC task by using external dialogue log corpus and discusses the official results.

متن کامل

OKSAT at NTCIR-12 Short Text Conversation Task: Priority to Short Comments, Filtering by Characteristic Words and Topic Classification

Our group OKSAT submitted five runs for Chinese and Japanese subtasks of the NTCIR-12 Short Text Conversation task (STC). We searched not only posts but also comments for terms of each query (post). We also gave more priority to short comments than longer ones. Then we filtered retrieved comments by characteristic words including proper nouns. We added attributes to the corpus and also to the q...

متن کامل

SG01 at the NTCIR-13 STC-2 Task

We describe how we build the system for NTCIR-13 Short Text Conversation (STC) Chinese subtask. In our system, we use the retrieval-based method and the generationbased method respectively. For the retrieval-based method, we develop several features to match the candidates and then apply a learning to rank algorithm to get properly ranked results. For the generation-based method, we first gener...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017